17 research outputs found

    FLatten Transformer: Vision Transformer using Focused Linear Attention

    Full text link
    The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks. Linear attention, on the other hand, offers a much more efficient alternative with its linear complexity by approximating the Softmax operation through carefully designed mapping functions. However, current linear attention approaches either suffer from significant performance degradation or introduce additional computation overhead from the mapping functions. In this paper, we propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness. Specifically, we first analyze the factors contributing to the performance degradation of linear attention from two perspectives: the focus ability and feature diversity. To overcome these limitations, we introduce a simple yet effective mapping function and an efficient rank restoration module to enhance the expressiveness of self-attention while maintaining low computation complexity. Extensive experiments show that our linear attention module is applicable to a variety of advanced vision Transformers, and achieves consistently improved performances on multiple benchmarks. Code is available at https://github.com/LeapLabTHU/FLatten-Transformer.Comment: ICCV 202

    Fine-grained Recognition with Learnable Semantic Data Augmentation

    Full text link
    Fine-grained image recognition is a longstanding computer vision challenge that focuses on differentiating objects belonging to multiple subordinate categories within the same meta-category. Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories. Although commonly used image-level data augmentation techniques have achieved great success in generic image classification problems, they are rarely applied in fine-grained scenarios, because their random editing-region behavior is prone to destroy the discriminative visual cues residing in the subtle regions. In this paper, we propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Specifically, we produce diversified augmented samples by translating image features along semantically meaningful directions. The semantic directions are estimated with a covariance prediction network, which predicts a sample-wise covariance matrix to adapt to the large intra-class variation inherent in fine-grained images. Furthermore, the covariance prediction network is jointly optimized with the classification network in a meta-learning manner to alleviate the degenerate solution problem. Experiments on four competitive fine-grained recognition benchmarks (CUB-200-2011, Stanford Cars, FGVC Aircrafts, NABirds) demonstrate that our method significantly improves the generalization performance on several popular classification networks (e.g., ResNets, DenseNets, EfficientNets, RegNets and ViT). Combined with a recently proposed method, our semantic data augmentation approach achieves state-of-the-art performance on the CUB-200-2011 dataset. The source code will be released

    Latency-aware Unified Dynamic Networks for Efficient Image Recognition

    Full text link
    Dynamic computation has emerged as a promising avenue to enhance the inference efficiency of deep networks. It allows selective activation of computational units, leading to a reduction in unnecessary computations for each input sample. However, the actual efficiency of these dynamic models can deviate from theoretical predictions. This mismatch arises from: 1) the lack of a unified approach due to fragmented research; 2) the focus on algorithm design over critical scheduling strategies, especially in CUDA-enabled GPU contexts; and 3) challenges in measuring practical latency, given that most libraries cater to static operations. Addressing these issues, we unveil the Latency-Aware Unified Dynamic Networks (LAUDNet), a framework that integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. To bridge the theoretical and practical efficiency gap, LAUDNet merges algorithmic design with scheduling optimization, guided by a latency predictor that accurately gauges dynamic operator latency. We've tested LAUDNet across multiple vision tasks, demonstrating its capacity to notably reduce the latency of models like ResNet-101 by over 50% on platforms such as V100, RTX3090, and TX2 GPUs. Notably, LAUDNet stands out in balancing accuracy and efficiency. Code is available at: https://www.github.com/LeapLabTHU/LAUDNet

    Dynamic Perceiver for Efficient Visual Recognition

    Full text link
    Early exiting has become a promising approach to improving the inference efficiency of deep networks. By structuring models with multiple classifiers (exits), predictions for ``easy'' samples can be generated at earlier exits, negating the need for executing deeper layers. Current multi-exit networks typically implement linear classifiers at intermediate layers, compelling low-level features to encapsulate high-level semantics. This sub-optimal design invariably undermines the performance of later exits. In this paper, we propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task with a novel dual-branch architecture. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Bi-directional cross-attention layers are established to progressively fuse the information of both branches. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features. Dyn-Perceiver constitutes a versatile and adaptable framework that can be built upon various architectures. Experiments on image classification, action recognition, and object detection demonstrate that our method significantly improves the inference efficiency of different backbones, outperforming numerous competitive approaches across a broad range of computational budgets. Evaluation on both CPU and GPU platforms substantiate the superior practical efficiency of Dyn-Perceiver. Code is available at https://www.github.com/LeapLabTHU/Dynamic_Perceiver.Comment: Accepted at ICCV 202

    Adaptive Rotated Convolution for Rotated Object Detection

    Full text link
    Rotated object detection aims to identify and locate objects in images with arbitrary orientation. In this scenario, the oriented directions of objects vary considerably across different images, while multiple orientations of objects exist within an image. This intrinsic characteristic makes it challenging for standard backbone networks to extract high-quality features of these arbitrarily orientated objects. In this paper, we present Adaptive Rotated Convolution (ARC) module to handle the aforementioned challenges. In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images, and an efficient conditional computation mechanism is introduced to accommodate the large orientation variations of objects within an image. The two designs work seamlessly in rotated object detection problem. Moreover, ARC can conveniently serve as a plug-and-play module in various vision backbones to boost their representation ability to detect oriented objects accurately. Experiments on commonly used benchmarks (DOTA and HRSC2016) demonstrate that equipped with our proposed ARC module in the backbone network, the performance of multiple popular oriented object detectors is significantly improved (e.g. +3.03% mAP on Rotated RetinaNet and +4.16% on CFA). Combined with the highly competitive method Oriented R-CNN, the proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP

    Learning to Weight Samples for Dynamic Early-exiting Networks

    Full text link
    Early exiting is an effective paradigm for improving the inference efficiency of deep networks. By constructing classifiers with varying resource demands (the exits), such networks allow easy samples to be output at early exits, removing the need for executing deeper layers. While existing works mainly focus on the architectural design of multi-exit networks, the training strategies for such models are largely left unexplored. The current state-of-the-art models treat all samples the same during training. However, the early-exiting behavior during testing has been ignored, leading to a gap between training and testing. In this paper, we propose to bridge this gap by sample weighting. Intuitively, easy samples, which generally exit early in the network during inference, should contribute more to training early classifiers. The training of hard samples (mostly exit from deeper layers), however, should be emphasized by the late classifiers. Our work proposes to adopt a weight prediction network to weight the loss of different training samples at each exit. This weight prediction network and the backbone model are jointly optimized under a meta-learning framework with a novel optimization objective. By bringing the adaptive behavior during inference into the training phase, we show that the proposed weighting mechanism consistently improves the trade-off between classification accuracy and inference efficiency. Code is available at https://github.com/LeapLabTHU/L2W-DEN.Comment: ECCV 202

    Emerging role of ubiquitination/deubiquitination modification of PD-1/PD-L1 in cancer immunotherapy

    No full text
    As members of the immune checkpoint family, PD-1 and its ligand PD-L1 play critical roles in maintaining the balance between autoimmunity and tolerance. The interaction of PD-1/PD-L1 is also involved in tumor evasion inside the tumor microenvironment, caused by reduced T cell activation, proliferation, cytotoxic secretion, and survival. Previous research has shown that the expression level of PD-1/PD-L1 may be regulated by ubiquitin-mediated proteasome degradation, which is an important mode of post-translational modification (PTM). PD-1/PD-L1 ubiquitin modification research in tumor immunotherapy is the subject of the present review, which aims to assess the most recent developments in this area. We offer a short explanation of PD-1/PD-L1 as well as some basic background information on the UPS system and discuss many routes that target E3s and DUBs, respectively, in the regulation of PD-1/PD-L1 in tumor immunotherapy. In addition, we offer numerous innovative prospective research areas for the future, as well as novel immunotherapy concepts and ideas. Taken together, the information compiled herein should serve as a comprehensive repository of information about tumor immunotherapy that is currently available, and it should be useful in the design of future studies, as well as the development of potential targets and strategies for future tumor immunotherapy
    corecore